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Abstract. BufFon's needle experiment is well-known: take a plane on which parallel lines at 
unit distance one from the next have been marked, throw a needle of unit length at random, and, 
finally, declare the experiment a success if the needle intersects one of the lines. Basic calculus 
implies that the probability of success is f = 0.63661, and the experiment can be regarded as an 
analog (i.e., continuous) device that stochastically "computes" ^. GeneraHzing the experiment 
and simplifying the computational framework, we ask ourselves which probability distributions 
can be produced perfectly, from a discrete source of unbiased coin flips. We describe and analyse 
a few simple Buffon machines that can generate geometric, Poisson, and logarithmic-series distri- 
butions (these are in particular required to transform continuous Boltzmann samplers of classical 
combinatorial structures into purely discrete random generators). Say that a number is Buffon if 
it is the probability of success of a probabilistic experiment based on discrete coin flippings. We 
provide human-accessible Buffon machines, which require a dozen coin flips or less, on average, and 
produce experiments whose probabilities are expressible in terms of numbers such as tt, exp(— 1), 
log 2, v^, cos(i), C(5). More generally, we develop a collection of constructions based on simple 
probabilistic mechanism,s that enable one to create Buffon experiments involving compositions of 
exponentials and logarithms, polylogarithms, direct and inverse trigonometric functions, algebraic 
and hypergeometric functions, as well as functions defined by integrals, such as the Gaussian error 
function. 
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Figure 1. The main probability laws: support, expression of the distribution, proba- 
bility generating function (PGF), and naming convention for generators. 



The BufFon experiment described in the abstract is modelled by two variates, J7, V, uniformly 
distributed over the unit interval [0, 1] (representing the initial position of the centre of the needle 
modulo 1 and its orientation measured as a fraction of 180 degrees) . The condition for intersection 
is that the closest distance min(?7, 1 — [/) to a parallel line be at most half the length ^ sin(7rF) 
of the perpendicular projection of the needle; that is, the predicate min(J7, 1 — t/) < 5 sin(7ry). 
Simple calculations then justify the ^ probability evaluation. 

One can regard BufFon's experiment as a simple analog device that takes as input uniform 
[0, l]-random variables and outputs a discrete {0, l}-random variable, with 1 for success and for 
failure. What are permitted are exact operations over real numbers; here, computing a minimum, 
multiplying by 1/2, computing x t—^ sin(7ra;), and determining inequalities between reals. Without 
attempting a formal definition, we shall name real Bujjon machine, any abstract device that has 
internal registers capable of storing infinite-precision real numbers, that is equipped with a fixed 
set of base functions (e.g., ±, max, and sin), and is capable of testing signs of numbers. The 
classical problem of simulating random variables can then be described as the contruction of real 
BufFon machines that are both simple and computationally efficient. We refer to the encyclopedic 
treatise of Devroye |1] for many examples related to the simulation of distributions of various 
types, such as Gaussian, exponential, Cauchy, and stable. 

Discrete Buffon machines. Our objective here is the perfect simulation of discrete ran- 
dom variables, i.e., variables supported by Z or one of its subsets: see Fig. [TJfor those we shall 
be considering. It is then natural to restrict the source of randomness to produce only uniform, 
random bits provided by a procedure flip= random{o,i}- Since we are interested in finite com- 
putation (rather than infinite-precision real numbers), we introduce discrete Buffon machines, or 
discrete generators, whose registers can contain binary sequences of finite length ("strings"). A 
fundamental question is then the following. 

How to generate discrete distributions exactly and efficiently, using only a discrete 
source of random bits and finitary computations? 

A pioneering work in this direction is that of Knuth and Yao |17j who discuss the power of various 
restricted devices (e.g., finite-state machines). Knuth's treatment [151 §3-4] and the articles [91131] 
present additional results along these lines. 

Our original motivation for studying discrete BufFon machines came from Boltzmann samplers 
for combinatorial structures, following the approach of Duchon, Flajolet, Louchard, and Schaef- 
fer [5] and Flajolet, Fusy, and Pivoteau [7]. The current implementations relie on real number 
computations, and the need arises to generate distributions such as geometric, Poisson, or loga- 
rithmic, with various ranges of parameters — since the objects ultimately produced are discrete, it 
is natural to try and produce them by purely discrete means. 

Buffon numbers. Here is an intriguing range of related questions. Let be a discrete BufFon 
machine that generates a Bernoulli random variable X, whose value lies in {0,1}. We say that 
Al is a Buffon computer for the number p :— V{X = 1). A number is said to be a Buffon number 
if there is a BufFon machine that computes it stochastically. We will be explicitly restricting 
attention to devices that have as simple a structure as seems possible. The question then arises 
as to which real numbers are BufFon numbers in that sense. In other words, we seek whether there 
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exist simple finite mechanisms that transform uniform {0,l}-bits into Bernoulli variables whose 
probabilities of success are numbers such as 

(1) I/V2, e-\ log2, ^-3, 

TT e — 1 

This problem can be seen as a vast generalization of BufFon's needle problem. 

Complexity and simplicity. For our discrete Buffon generators and computers, we add the 
requirement of operating with a constant expected number of coin flippings. We will also impose for 
ourselves the losely defined constraint that the programs are short and conceptually simple, to the 
extent of being easily implemented by a human. Thus, emulating infinite-precision computation 
with multiprecision interval arithmetics or appealing to functions of high complexity as primitives 
is disallowed. Our programs then only make use of simple integer counters, string registers 
and stacks (Q, as well as "bags" (Q. The reader may try her hand at determining BufFon-ness 
in this sense of some of the numbers listed in ([T]). As briefly discussed in Section |5] our universal 
interpreter has less than 60 lines of Maple code, being built on primitives of between one and a 



dozen lines. We shall for instance later produce a Buffon computer for the constant (see (111) 

7 TT^ 1 oc ^ 

(2) C - -C(3)-- log 2+ -(log 2)3- 0.53721 13193 C(s) := E 

n — 1 



one that is human-compatible, that consumes on average less than 6 coin flips, and that requires 
at most 20 coin flips in 95% of the simulations. 

Warning. Due to space limitations in this extended abstract, we focus the discussion on algorithmic 
design. Analytic estimates can be approached by means of (probability, counting) generating functions 



in the style of methods of analytic combinatorics [TO]; see, the typical discussion in Subsection 2.1 The 
main results of this note are Theorem [2] (Poisson and logarithmic generators), Theorem [3] (realizability of 
exps, logs, and trigs). Theorem [s] (general integrator), and Theorem [7] (inverse trig functions). 

1. Framework and examples 

Our purpose is to set up a system based on the composition of simple probabilistic experiments, 
corresponding to simple computing devices. The framework of our Buffon machines allows for a 
finite control, with conditionals and tests, as well as unbounded iterations (do-loops). The basic 
data structures used here are: (z) integer registers (subject to addition of ±1 and test to 0); 
(ii) stacks, in the usual sense of automata theory; (Hi) string registers, which are stacks with 
some additional operations permitted (e.g., clear from current position to the top); (iv) bags, 
which are partly specified string registers with local modification of a symbol being allowed. The 
Buffon machines we consider are supposed to be equipped with an unbiased random-bit generatoi^ 
named "flip". The cost measure is taken to be the number of coin flips. Since we only consider 
restricted devices, the overall simulation cost is, in most cases, proportional to this measure. 

Deflnition 1. Given a class M of Buffon machines, we say that the function A '—^ ^(A), defined 
for A G (0,1) and with values (t>{X) € (0,1) is realizable (or has a simulation^ if there is a 
machine in M., such that, when equipped with a perfect generator T'Q{\) of a Bernoulli variable of 
parameter A — P(l), it outputs a Bernoulli random variable of parameter (/)(A). The function 0(A) 
is said to be strongly realizable (or have a strong simulation^ if the number of coin flips C admits 
exponential tails: for any e > 0, there are constants K and p < \ so that, for any A € [e, 1 — e], 
one has P(C > m) < Kp-''\ 

If one allows totally unrestricted machines, every computable real number of (0, 1) can be 
simulated (e.g., make use of continued fraction convergents or dyadic approximations). Similarly, 
Nacu and Peres [24 show that essentially every (computable) Lipschitz function is realizable 
and every (computable) real-analytic function has a strong simulation, with such unrestricted 
devices. Rather, our purpose here is to show how sophisticated perfect simulation algorithms can 

^This convention entails no loss of generality. Indeed, as first observed by von Neumann, suppose we only have 
a biased coin, where P(l) = p, P(0) = 1 — p, with p S (0, 1). Then, one should toss the coin twice: if 01 is observed, 
then output 0; if 10 is observed, then output 1; otherwise, start again another round of coin tossings. 
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be systematically and efficiently synthetized by composition of simple probabilistic processes, this 
without a need for any multiprecision arithmetic routines. To set the stage of the present study, 
we shall briefly discuss in sequence: (i) decision trees and polynomial functions; (ii) Markov chains 
(finite graphs) and rational functions. 

Decision trees and polynomials. Given three machines P, Q, R with outputs in {0, 1} and 
with Pp(l) — p, IPq(1) — q, ]Pfl(l) = r, we can easily build machines, corresponding to loopless 
programs, whose probability of success is p ■ q, 1 — p, or any composition thereof: see below for 
the most important boolean primitives: 

Name realization function 

Conjunction (P A Q) if P() = 1 then return((5()) else return(O) p A q = p ■ q 

Disjunction (P V Q) if P() = then return(Q()) else return(l) p\/ q = p + q — pq 

(3) Complementation (^P) if P() = then return(l) else return(O) 1 — P 
Squaring (P A P) 

Conditional (P ^ P | Q) if RQ = 1 then return(P()) else return(Q()) rp + (1 - r)q. 

Mean if flip() then return(P()) else return(Q()) ^{p + q)- 

We can then simulate a Bernoulli of parameter any dyadic rational s/2*, starting from the unbiased 
flip procedure, performing t draws, and declaring a success in s designated cases. Also, by calling 
a Bernoulli variate of parameter x a fixed number m of times, then observing the sequence of 
outcomes and its number k of Is, we can realize any polynomial function a„i,A;a;'°(l — a:)™"'^, with 
am,k any integer satisfying < am.k < (™)- 

Finite graphs (Markov chains) and rational functions. We consider now programs that 
allow iteration and correspond to a finite control graph — these are equivalent to Markov chains, 
possibly parametrized by "boxes" representing calls to external Bernoulli generators. In this way, 
we can produce a Bernoulli generator of any rational parameter A € Q, by combining a simpler 
dyadic generator with iteration. (For instance, to get a rB(|), flip an unbiased coin twice; return 
success if 11 is observed, failure in case of 01 or 10, and redo the experiment in case of 00.) 
Clearly, only rational functions can be realized by finite graphs. A highly useful function is the 
even-parity: 

(4) [even-parity] do { if P() then return(l); if P() = then return(O) }. 

The probability of fc + 1 calls to P{) is (1 — p)p'^, which, when summed over = 0, 2, 4, . . ., gives 
the probability of success as 1/(1 + p); thus, this function is realizable. Here are furthermore 
two important characterizations based on works of Nacu-Peres [21] (see also Wastlund [30]) and 
Mossel-Peres [53], after adaptation to our framework. 

Theorem 1 ( |23l I24 [ 130] ). (i) Any polynomial f{x) with rational coefficients that maps (0,1) 
into (0, 1) is strongly realizable by a finite graph, {ii) Any rational function f{x) with rational 
coefficients that maps (0,1) into (0, 1) is strongly realizable by a finite graph. 

(Both parts relie on the possibility of generating arbitrary rational mixtures of distributions. 
Part (i) is further based on a theorem of Polya, relative to nonnegative polynomials; Part (ii) 
depends on an ingenious "block simulation" principle.) 

We remark at this stage that we can also produce a geometric variate from a Bernoulli of the 
same parameter: just repeat till failure. This gives rise to the program 

(5) [Geometric] rG(A) := { K := 0; do { if rB(A) = then return(A'); K ~ K + 1;}}. 

(The even-parity construction implicitly makes use of this.) The particular rG(^) then simply 
iterates on the basis of a flip. Also, if we have access in some way to the complete binary expansion 
of p, we have a Bernoulli generator as 

(6) [Bernoulli/binary] rB(p) ~ { let Z 1 + rG(|); return(bit(Z,p)) }. 

In words: in order to draw a Bernoulli variable of parameter p, return the bit of p whose random 
index is given by a geometric variable of parameter 1/2. (Proof: emulate a comparison between a 
uniformly random U G [0, 1] with p £ [0, 1].) This property gives us a rB(p) for p G Q by means 
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rVN[P](A) — { do { 
N := rG(A); 

let U := (C/i, . . . , Un) be a vector of [0, l]-uniform variables. 

{ bits of the Uj are produced on a call-by-need basis to determine a and r } 

let T := trie(U); let a :— type(U); { computed from the trie r } 

it a epN then return(iV) } }. 
Figure 2. The von Neumann schema rVN['P](A), in its continuous version, relative 
to a class of permutations V and a parameter A (equivalently given by its Bernoulli 
generator rB(A)). 



of a simple Markov chain, based on the eventual periodicity of the representation of p. The cost 
is plainly a geometric of rate 1/2. (The construction will prove valuable when we discuss "bags".) 

2. The von Neumann schema 

The idea of generating certain probability distributions by means of their Taylor expansion 
seems to go back to von Neumann. An early application discussed by Knuth and Yao |17j . fol- 
lowed by Flajolet and Saheb [S], is the exact simulation of an exponential variate by means of 
random [0, l]-uniform variates, this by a "continuation" process that altogether avoids multipreci- 
sion operations. In this section, we build upon von Neumann's idea and introduce in Subsection |2.1| 
a general schema for random generation — the von Neumann schema. We then explain in Subsec- 
tion |2.2| how this schema may be adapted to realize classical transcendental functions, such as 
e^^, cos(A), only knowing a generator rB(A). 

2.1. Von Neumann generators of discrete distributions. First a few notations, follow- 
ing [ini. Start from a class P of permutations, with Vn the subset of permutations of size n 
and Pn the cardinality of Vn- Introduce the (counting) exponential generating function, or EGF, 

P{z) ■■^Y.pA- 

n>0 

For instance, the classes Q,TZ,S of, respectively, all permutations, sorted permutations, and cyclic 
permutations have egfs given by Q{z) = R{z) = e^, S{z) = log since Qn = i?„ = 1, 
Sn = {n — 1)1, for n > 1. Observe that the class S of cyclic permutation is isomorphic to the class 
of permutations such that the maximum occurs in the first position: Ui > U2, ■ ■ ■ , Un- (It suffices 
to "break" a cycle at its maximum clement.) We shall also denote by S the latter class, which is 
easier to handle algorithmically. 

Let U = (Ui, . . . , Un) be a vector of real numbers. By replacing each Uj by its rank in U, 
we obtain a permutation a = (tJi, . . . ,cr„), which is called the (order) type of U. For instance, 
U = (1.41,0.57,3.14,2.71) has order type a = (2,1,4,3). We shall write type(U) for the type 
of the vector U. The von Neumann schema, relative to a class V of permutation is described 
in Figure [2] Observe that it only needs a geometric generator rG(A), hence, eventually, only 
a Bernoulli generator rB(A). (The latter can result either from a basic generator that has the 
parameter A presented in binary on an auxiliary input tape or be itself constructed by composition 
rules such as those of Subsection [l] ) 

First, by construction, a value N is, at each stage of the iteration, chosen with probability (1 — 
A)A^. An iteration that succeeds then returns the value N = n with probability 

(1 - A)P„A"/n! ^ 1 PnA" 
(l-A)E„-PnA"/ri! P{X) n! ' 
For V one of the three classes Q, TZ, S described above this gives us three interesting distributions: 



permutations (V): 


all (Q) 


sorted (7?.) 


cyclic (S) 


distribution: 


(1 - A)A" 
geometric 


Poisson 


1 A" 

--, L:=log(l-A)-i 
L n 

logarithmic. 
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The case of all permutations is trivial, since no order-type restriction is involved, so that the first 
value of N £ Geo(A) is returned. It is notable that, in the other two cases, one can generate the 
Poisson and logarithmic distributions, by means of permutations obeying simple restrictions. 

Next, implementation details should be discussed. At the most general level, once N has 
been drawn, we can imagine producing the Uj in sequence, by generating at each stage only the 
minimal number of bits needed to distinguish any Uj from the other ones. This corresponds to 
the construction of a digital tree [Ml [5^ and is summarized by the command "set r := trie(C/)" 
in the schema of Figure [2] As the digital tree t is constructed, the Uj are gradually sorted, so that 
the order type tr can be determined — this involves no additional flips, just bookkeeping. The test 
cr G is of this nature and it requires no flip at all. Furthermore, in the case of the distributions 
of 0, one never needs to store the entire digital tree: it suffices to keep a single string register 
that preserves the value of current interest. For instance, for the logarithmic series distribution 
only the value of Ui ever needs to be preserved; for the Poisson distribution, only the value of a 
"current" Uk has to be maintained at any given time. 

Finally, we need to discuss costs. The cost of each iteration, as measured by the number of 
coin flips, is exactly that of generating the tree t of random size N . The number of coin flips to 
build T coincides with the path length of r, written uj{t), which satisfies the inductive definition 

(8) u{t) = \t\+lo{to) + uj{t^), |r|>2, 

where t = (tq, ti) and |r| is the size of r, that is, the number of elements contained in r. 

Path length is a much studied parameter, starting with the work of Knuth in the mid 1960s 
relative to the analysis of radix-exchange sort [ini pp. 128-134]; see also the books of Mahmoud [55] 
and Szpankowski [35] as well as Vallee et al.'s analyses under dynamical source models [31 [5S]. It 
is known that the expectation of path length, for n uniform infinite binary sequences, is finite, 
with exact value ^ 



oo 
k=0 



2 



and asymptotic form (given by a Mellin transform analysis [H |22l [28]): E„[a;] = ?^log2 n + 0{n). 

The distribution of path length is known to be asymptotically Gaussian, as n oo, after 
Jacquet and Regnier [T3] and Mahmoud et al. [^; see also [501 §11-2] for an exposition. For our 
purposes, it suffices to observe that the bivariate egf 

Hiz,q) :=f:E„r]J 

satisfies the nonlinear functional equation H{z,q) = H (^^,q^^ + z(l — q), with H(0,q) — 1. 
This equation fully determines H, since it is equivalent to a recurrence on coefficients; hn{q) :— 
n\[z^]H {z , q) , namely, for n > 2, 

1 1 /n\ 

(9) K{q) = ^ _ ^„^i„„ ^ (fc j hk{q)hn-k{q)- 

We can now summarize the foregoing discussion and state: 

Proposition 1. (i) Given a class V of permutations and a parameter A € (0, 1), the von Neumann 
schema rVN[7'](A) produces exactly a discrete random variable with probability distribution 

^ ' P(A) n\ 

ill) The number K of iterations has expectation 1/s, where s = (1 — A)P(A), and its distribution 
is \ + Geo(s). 

{Hi) The number C of flips consumed by the algorithm (not counting the ones in TG{X)) is a 
random variable with probability generating function 
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where H^,H are determined by 

H+{z, q)^{l~z)f2 ^K{q)z\ H-{z,q) - (1 - ^) £ f 1 - ^ ) hn{q)zr 



n=0 n=0 



The distribution of C has exponential tails. 

Proof. A crucial observation is that the digital tree created at each step of the iteration is only a 
function of the underlying set of values. But there is complete independence between this set of 



values and their order type. This serves to justify in particular (10 1. □ 



A simple consequence of Proposition [T] and (|7| is the possibility of generating a Poisson or 
logarithmic variate by a simple device: only one branch of the trie needs to be maintained. 

Theorem 2. The Poisson and logarithmic distributions of parameter A € (0, 1) have a strong 
simulation by a Buffon machine that only uses a single string register. 

Since the sum of two Poisson variates is Poisson (with rate the sum of the rates), the strong 
simulation result extends to any X £ Poi(A), for any A G ]R>o. This answers an open question 
of Knuth and Yao in [17l p. 426]. We may also stress here that the distributions of costs are 
easily computable: with a symbolic manipulation system, such as Maple, the cost of generating a 
Poisson(l/2) variate is found to have probability generating function (Item (Hi) of Proposition [l]) 

3 7 2 119 4 19 5 2023 . 179 7 

- H g H g H g H g H g H . 

4 128 4096 1024^ 131072^ 16384^ 

Interestingly enough, the analysis of the logarithmic generators involves ideas similar to those 
relative to a classical leader election protocol [SI [53] . 

2.2. BufFon computers: logarithms, exponentials, and trig functions. We can also take 
any of the previous constructions and specialize it by declaring a success whenever a special value 
A'^ = a is returned, for some predetermined a (usually a — 0, 1), declaring a failure, otherwise. 
For instance, the Poisson generator with a — gives us in this way a Bernoulli generator with 
parameter A' = exp(— A). Since the von Neumann machine only requires a Bernoulli generator 
rB(A), we thus have a purely discrete construction rB(A) — > FB (e^"^) . Similarly, the logarithmic 

generator restricted to a = 1 provides a construction FB(A) — > FB (^ iog(i^A)-i ) ■ Naturally, these 
constructions can be enriched by the basic ones of Section [l] in particular, complementation. 

Another possibility is to make use of the number K of iterations, which is a shifted geometric 
of rate s = (1 — A)P(A); see Proposition [T| Item (ii). If we declare a success when K = b, for some 
predetermined b, we then obtain yet another brand of generators. The Poisson generator used in 
this way with 6=1 gives us FB(A) — *■ FB ((1 — A)e^) , FB (Ae^~^) , where the latter involves 
an additional complementation. 

Trigonometric functions can also be brought into the game. A sequence U = ([/i, . . . , [/„) is 
said to be alternating ii Ui < U2 > U3 < U4 > ■ ■ ■ . It is well known that the egfs of the classes A'^ 
of even-sized and A~ of odd-sized permutations are respectively A~^{z) — sec(z) — l/cos(2;), and 
A~{z) ~ tan(z) = sm{z) /cos{z) . (This result, due to Desire Andre around 1880, is in most books 
on combinatorial enumeration, e.g., [101 HU |57].) Note that the property of being alternating can 
be tested sequentially: only a partial expansion of the current value of Uj needs to be stored at 
any given instant. By making use of the properties A'^,A~, with, respectively = 0, 1, we then 
obtain new trigonometric constructions. In summary: 

Theorem 3. The following functions admit a strong simulation: 

^ ^ (l-a;)log- , xlog(l/a;), 



log(l — log(l/a;)' 1 — x 

a;cot(x), (1 — x) cos(a;), (1 — a;) tan(a;) 

cos(a;) 
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2.3. Polylogarithmic constants. The probability that a vector U is such that Ui > U2, . . . , J7„ 

(the first element is largest) equals 1/n, a property that underlies the logarithmic series generator. 
By sequentially drawing r several such vectors and requiring success on all r trials, we deduce 
constructions for families involving the polylogarithmic functions, 



Li^(z) 

n=l 

with r £ Z>i. Of course, Lii(l/2) = log 2. There are a few special evaluations known for 
polylogarithmic values, when r > 2 (see the books by Berndt on Ramanujan Ch. 9] and by 
Lewin 0(13]). Of special interest for us are the evaluations 

(11) Li2(l/2) = ^-ilog^2, Li3(l/2)= ilog32-^log2+^C(3). 

By rational convex combinations, we obtain BufFon computers for |j and ^C(3)- Similarly, the 
celebrated BBP [Bailey-Borwcin-Plouffe] formulae [IJ can be implemented as BufFon machines. 

3. Square roots, algebraic, and hypergeometric functions 

We now examine a new brand of generators based on properties of ballot sequences, which open 
the way to new constructions, including an important square-root mechanism. The probability 
that, in 2n tosses of a fair coin, there are as many heads as tails is Wn = 2^Cn)- '^^^ property 
is easily testable with a single integer counter R subject only to the basic operation i? := i? ± 1 

and to the basic test R = 0. From this, one can build a square-root computer and, by repeating 
the test, certain hypergeometric constants can be obtained. 

3.1. Square-roots. Let iV be a random variable with distribution Geo(A). Assume we flip 
2N coins and return a success (1), if the score of heads and tails is balanced. The probabil- 
ity of success is 

00 



5(A) := 2^(1 - A)A"n7„ = VT^. 

n=0 

(The final simplification is due to the binomial expansion of (1 — x)"^/^.) This simple property 
gives rise to the square-root constructior^ 

TB (^r^) := { let iV := rG(A); 
(12) draw Xi,...,X2iv with X, G and P(-Kl) = P(-l) = 

set A := X]j=o ^j' if A = then return(l) else return(O) }. 

The mean number of coin flips used is then simply obtained by differentiation of generating 
functions. 



Theorem 4. The square-root construction of Equation (12 1 provides an exact Bernoulli generator 
of parameter \/l — A, given a rB(A). The mean number of coin flips required, not counting the 
ones involved in the calls to rB(A), is function \f\^^ is strongly realizable. 

By complementation of the original Bernoulli generator, we also have a construction rB(A) — > rB(l — 
A) — > TB I -\/A) , albeit one that is irregular at 0. 



Note 1. Computability with a pushdown automaton. It can be seen that the number A'^ in the square-root 
generator never needs to be stored. If we expand the code of the rG(A) generator that produces A'', we 
obtain the equivalent sequence of instructions 



rB (^1^) — { do { A — 0; 

if rB(A) = then break; 

if flip=l then A := A 4- 1 else A := A - 1; if flip=l then A — A + 1 else A — A - 1 } 
if A = then return(l) else return(O) }. 

In this way, only a stack of unary symbols needs to be maintained: the stack keeps track of the absolute 
value I A| stored in unary, the finite control can keep track of the sign of A. We thus have realizability of 
the square-root construction by means of a pushdown (stactz) automaton. 



^ Similar generators have been first developed by Wastlund |30l and by Mossel— Peres | 23| . 
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{let N ■- rG(A); 

draw w ■- X1X2 ■ ■ ■ Xn with Xj G {H, T} and r{H) = P(r) = |; 
if w £ L(G;5) then return(l) else retum(O) }. 

Figure 3. The algebraic construction associated to the pushdown automaton arising 
from a bistoch grammar. 



Note [T] suggests a number of variants of the square-root construction, which are also computable 
by a pushdown automaton. For instance, assume that, upon the condition "flip=l", one does 
A A + 2 (and still does A := A — 1 otherwise). The sequences of H (heads) and T (tails) 
that lead to eventual success (i.e., the value 1 is returned) correspond to lattice paths that are 
bridges with vertical steps in {+2, —1}; see [101 §VII.8.1]. The corresponding counting generating 
function is then S{z) = X]n>o (^n)-^^^' ^^'^ probability of success is (1 — \)S (|). As it is well 
known (via Lagrange inversion), the function S{z) is a variant of an algebraic function of degree 3; 
namely, S{z) = i_3^V(z)^ ' where Y{z) = z + zY{z)^, and Y{z) — z + z'^ + 3z'^ + ■ • • is a generating 
function of ternary trees. One can synthetize in the same way the family of algebraic functions 



S{z)^S^Hz)^J2C^ 

n>0 ^ 



tn 



by updating A with A := A + (t - 1). 

3.2. Algebraic functions and stochastic grammars. It is well known that unambiguous 
context-free grammars are associated with generating functions that are algebraic: see [101 for 
background (the Chomsky-Schiitzenberger Theorem). 

Definition 2. A binary stochastic grammar (a "bistoch") is a context-free grammar whose ter- 
minal alphabet is binary, conventionally {H, T}, where each production is of the form 

(13) X — > Hm + Tn, 

with m, n that are monomials in the non-terminal symbols. It is assumed that each non-terminal 
is the left hand side of at most one production. 

Let G, with axiom S, be a bistoch. We let L[G'; S] be the language associated with S. By the 
classical theory of formal languages and automata, this language can be recognized by a pushdown 
(stack) automata. The constraint that there is a single production for each non-terminal on the left 
means that the automaton corresponding to a bistoch is deterministic. (It is then a simple matter 
to come up with a recursive procedure that parses a word in {H, T}* according to a non-terminal 
symbols S.) In order to avoid trivialities, we assume that all non-terminals are "useful", meaning 
that each of them produces at least one word of {H, T}* . For instance, the one-terminal grammar 
y = Hyyy + T generates all Lukasiewicz codes of ternary trees [101 §L5.3] and is closely related 
to the previously seen construction. 

Next, we introduce the ordinary generating function (or OGf) of G and S, 

S{z) = E 

■wi^C[G;S] n>0 

where S'„ is the number of words of length n in L[G'; S]. The deterministic character of a bistoch 
grammar implies that the couting OGFs are bound by a system of equations (one for each non- 



terminal) deduced directly from (13), X{z) — zm -\- zn, where m, n are monomials in the OGFs 
corresponding to the non-terminals of m, n; see [TUI §1.5.4]. For instance we have, in the ternary 
tree case, Y — z + zY^ . 

Thus, any OGF y arising from a bistoch is a component of a system of polynomial equations, 
hence, an algebraic function. By elimination, the system can be reduced to a single polynomial 
equation P{z,y) — 0. We obtain, with a simple proof, a result of Mossel-Peres [531 Th. 1.2]: 

Theorem 5 ([23 ). To each bistoch grammar G and non-terminal S, there corresponds a con- 
struction (Figure^, which can be implemented by a deterministic pushdown automaton and calls 
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to a rB(A) and is of type rB(A) — > FB (5 (§)) , where S{z) is the algebraic function canonically 
associated with the grammar G and non-terminal S . 

Note 2. Stochastic grammars and positive algebraic functions. First, we observe that another way to 
describe the process is by means of a stochastic grammar with production rules X — > |m + |n, where 
each possibihty is weighted by its probability (1/2). Then fixing N — n amounts to conditioning on 
the size of the resulting object (n). This bears a superficial resemblance to branching processes, upon 
conditioning on the size of the total progeny, itself assumed to be finite. (The branching process may well 
be supercritical, as in the ternary tree case.) 

The algebraic generating functions that may arise from such grammars and positive systems of equations 
have been widely studied. Regarding their coefficients and singularities, we refer to the discussion of the 
Drmota-Lalley- Woods Theorem in |10J pp. 482-493] and references therein. Regarding the values of the 
generating functions, we mention the studies by Kiefer, Luttenberger, and Esparza 14 and by Pivoteau, 
Salvy, and Soria |25]. The former is motivated by the probabilistic verification of recursive Markov 
processes, the latter by the efficient implementation of Boltzmann samplers. 

It is not known however at this moment whether a function as simple as (1 — A)^^'^ is realizable by a 
stochastic context-free grammar or, equivalently, a deterministic pushdown automaton. (We conjecture a 
negative answer, as it seems that only square-root and iterated square-root singularities are possible for a 
wide class of positive polynomial systems.) 

3.3. Ramanujan, hypergeometrics, and a Buffon computer for I/tt. A subtitle might be: 
What to do if you want to perform Buffon's experiment but dont have needles, just coins? The 
identity 

1 _ ^ f2nVQn + l 

ri=0 ^ ^ 

due to Ramanujan (see [12 1 for related material), lends itself somewhat miraculously to evaluation 
by a simple BufFon computer. The following simple experiment (the probabilistic procedure Rama) 
succeeds (returns the value 1) with probabihty exactly I/tt. It thus constitutes a discrete analogue 
of Buffon's original, one with only three counters (T, a copy of T, and A). 

procedure Rama(); {returns the value 1 with probability I/tt} 

51. let T := X1+X2, where Xi,X2 € Geom(i); 

52. with probability § do T := T + 1; 

53. for j = 1,2, 3 do 

54. draw a sequence of 2T coin flippings; 

if (A = # Heads - # Tails) ^ then return(O); 

55. return(l). 

Observe that setting up the generation of a variable that is geometric with parameter ^ necessitates 
2 coin flips and then 2 coin flips per incrementation in the worst case (in fact 1^ coin flip on 
average) ; drawing a Bernoulli variable with probability | requires about 4 coin flips on average — 
this crude analysis takes care of steps SI and S2. The mean value of 5 is ^ = 1.222, so that 
typical values for 2S are in the set {0, 2, 4, 6, 8} (about 95% of the time). Finally, the for-loop of 
Step S3, is aborted as soon as the number of heads (or tails) differs from S, in which case the 
procedure returns 0, which means "failure" . It is only when the battery of the three tests in S4 
is successful that the procedure terminates returns the value 1, meaning "success" (in S5). On 
average, 9.8 coin flips are consumed by this experiment. (Similar hypergeometric constants can 
be produced by such devices.) 

4. A Buffon integrator 

Our purpose here is to develop, given a construction of type FB(A) — > FB((j!)(A)), a generator 
for the function 

(14) $(A) = \ I (l){w)dw. 

Jo 

An immediate consequence will be a generator for A<I'(A); that is, an "integrator". 



10 



p. FLAJOLET, M. PELLETIER, M. SOMA 



U 



Figure 4. The "geometric-bag" procedure bag{U) 
state (the pairs index-values and a partly filled register) and the complete Maple code. 



Ghalf :=proc() local K; 

# a geometric RV of param. 1/2 

K:=-l; do K:=K+1; if flip()=0 

then return(K) fi; od; 



=proc(U) local J; 

J:=l+Ghalf ; 

if type (U [J] ,name) 

then U[J] :=flip() ; fi; 
return (U [J] ) ; end; 
two graphic representations of a 



To start with, wc discuss a purely discrete implementation of rB(A) — > TB{UX), with 
U G [0,1], uniformly, where multiple invocations of rB(A) must involve the same value of U. 
Conceptually, it suffices to draw U as an infinite sequence of flips, then make use of this U to 
get a rB(J7) and then appeal to the conjunction (product) construction to get a rB(AC/) as 
rB([/) • rB(A). To implement this idea, it suffices to resort to lazy evaluation. One may think 
of U as a potentially infinite vector (ui, U2, . . .), where Vj represents the jth bit of U. Only a finite 
section of the vector is used at any given time and the Vj are initially undefined (or unspecified). 
Remember that a TB{U) is simply obtained by fetching the bit of U that is of order J, where 
J G 1 + Geo(|). In our relaxed lazy context, whenever such a bit Vj is fetched, we first examine 
whether it has already been assigned a {0, l}-value; if so, we return this value; if not, we first 
perform a fiip, assign it to Vj, and return the corresponding value: see Figure |4[ In a language 
like Maple, where it is possible to test whether a variable has been assigned, the implementation 
is obvious; otherwise, one should maintain an association list of the already "known" indices and 
values, and update it, as the need arises. 

Assume that 0(A) is realized by a Buffon machine that calls a Bernoulli generator rB(A). If 
we replace rB(A) by TB{XU), as described in the previous paragraph, we obtain a Bernoulli 
generator whose parameter is (t>{XU), where U is uniform over [0,1]. This is equivalent to a 
Bernoulli generator whose parameter is 0(Au) du = ^'(A), with <E>(A) as in (14 1. In summary: 



Theorem 6. Let (t>{X) be realizable by a Buffon machine M. Then the function <i>(A) — ^ 0(^^) dw 
is realizable by addition of a geometric bag to M. In particular, if (j){X) is realizable, then its in- 
tegral taken starting from is also realizable. If in addition 0(A) is analytic at 0, then its integral 
is strongly realizable. 



This result paves the way to a large number of derived constructions. For instance, starting 
from the parity construction, we obtain <&o('^) •— j Jq dw = log(l + A), hence, by product, a 
construction for log(l + A). When we now combine the parity construction with "squaring", where 
a TB{p) is replaced by the product TB{p) ■ TB{p), we obtain <i>i(A) := j = j arctan(A), 

hence also arctan(A). When use is made of the exponential (Poisson) construction A , one 

obtains (by squaring and after multiplication by A) a construction for <I>2(A) '■= /q^ e^™'/^ dw, so 
that the error function ("erf") is also realizable. Finally, the square-root construction combined 
with parity and integration provides <&3(A) := ^i+Z^ ^ ^-'^ \/l — A^ + arcsin(A), out of 
which we can construct | arcsin(A). In summary: 

Theorem 7. The following functions are strongly realizable (0 < a; < !).■ 

1 



log(l + a::), arctan(x), -arcsin(a;). 



The first two only require one bag; the third requires a bag and a stack; the fourth can be imple- 
mented with a string register and bag. 

Buffon computers for tt. Of special interest, in connection with (f>i, is the fact that <i>i(l) = 
arctan(l) — —. This gives us a Buffon computer for 7r/4. There are even further simplifications 
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due to the fact that rB(l) is trivial: this computer then only makes use of the U vector. Given its 
extreme simplicity, we can even list the complete code of this Madhava-Gregory-Leibniz (MGL) 
generator for 7r/4: 

MGL:=proc() do 

if bag(U)=0 then return (1) fi; if bag(U)=0 then return (1) fi; 

if bag(U)=0 then return(O) fi; if bag(U)=0 then return(O) fi; od; end. 

The Buffon computer based on arctan(l) works fine for small simulations. For instance, based on 
10,000 experiments, we infer the approximate value 7r/4 « 0.7876, whereas 7r/4 = 0.78539. In 
that particular run, the mean number of flips per experiment was about 27. However, values of 
U very close to 1 are occasionally generated (the more so, as the number of simulation increases) , 
and in fact the expected number of flips is infinite. This unwanted feature is somehow a reflection 
of the fact that, behind the identity 7r/4 = arctan(l), there is lurking the slowly convergent series 
representation (Madhava-Gregory-Leibniz) : j — j — ^ + ^ — ^ + -- -. 

The next idea is to consider formulae of a kind made well-known by Machin, who made use of 
arc-tangent addition formulae in order to reach the record computation of 100 digits of tt in 1706. 
For our purposes, a formula without negative signs is needed, the simplest of which 

(15) ^ = arctan ^-^ -I- arctan 

being especially suitable for a short program and is easily compiled in silico. With 10^ simulations, 
we obtained an estimate 7r/4 « 0.78598, to be compared to the exact value 7r/4 — 0.78539 • • • ; 
that is, an error of about 0.5 • 10^**, well within normal statistical fluctuations. The empirically 
measured average number of flips per generation of this Machin-like rB(7r/4) turned out to be 
about 6.45 coin flips. Figure [5] furthermore displays the empirical distribution of the number of 
coin flips, based on another campaign of 10^ simulations. The distribution of costs appears to 
have exponential tails matching fairly well the approximate formula P(C = fc) « 10"'^/^. 

Yet an alternative construction can be based on the arcsine and $3. Many variations are clearly 
possible, related to multiple or iterated integrals (use several bags). 

5. Experiments and conclusions 

We have built a complete protype implementation under the Maple symbolic manipulation 
system, in order to test and validate the ideas expounded above; see Figure [6] A generator such 
as ZA is specified as a composition of basic constructions, such as / exp(— /) [expn], / 1-^ ^/J 
[sqrtO], f ^ J f [intl], and so on. An interpreter using as source of randomness the built-in 
function random then takes this description and produces a {0, 1} result; this interpreter, which is 
comprised of barely 60 lines, contains from one to about a dozen intructions for each construction. 
In accordance with the general conditions of our contract, only simple register manipulations are 
ever used, so that a transcription in C or Java would be of a comparable size. 
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> 



> 



Z4 : =expn(compl (ave ( f lip, ave (intl ( intl (intl (even (prod( z , prod 
(X, y))), X, ONE), y, ONE), z, ONE ) , compl ( sqrtO ( intO ( ave ( logp 
(flip), sqrtO (prod (flip, intO(prod(Y, even (into (even (prod (X, 
X)), X, expn(prod(flip,Z))))), Y, expn(prod(f lip, flip) ))))) , 
Z, flip))))))): 
test(Z4, 10000) ; 



mean_number_of_f lips 



> val(Z4); 



103.1645000 
0.6313000000 



( 



4 + l«^)-{^ 



> evalf (val(Z4) ) ; 





1 


e 


4 














^ arctan 







0.6356033009 



Figure 6. Screen copy of a Maple session fragment showing: (i) the symbohc descrip- 
tion of a generator; (ii) a simulation of 10'* executions having a proportion of successes 
equal to 0.63130, with a mean number of flips close to 103; (in) the actual symbolic 
value of the probability of success and its numerical evaluation 0.63560 ■ • • . 



We see here that even a complicated constant such as the "value" of the probability associated 
with Z4 is effectively simulated with an error of 0.4 10"'^, which is consistent with normal statistical 
fluctuations. For such a complicated constant, the observed mean number of flips is here a little 
above 100. Note that the quantity ^(3) is produced (as well as retrieved automatically by Maple's 
symbolic engine!) from Beuker's triple integral: 
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C(3)- 



1 

Jo Jo 1 + 



dx dy dz. 



(On batches of 10^ experiments, that quantity alone only consumed an average of 6.5 coin flips, 
32'- 



whereas the analogous |iC(5) required barely 6 coin flips on average.) 



Note that the implementation is, by nature, freely extendible. Thus, given integration and our 



basic primitives (e.g., evcn(f) = TTj)' readily program an arc-tangent as a one-liner 



arctan(/) = / 



/ 



dx 



1 



and similarly for sine, arc-sine, erf, etc, with the symbolic engine automatically providing the 
symbolic and numerical values, as a validation. 

Here is flnally a table recapitulating 9 ways of building Buffon machines for tt related constants, 
with, for each of the methods, the value, and empirical average of the number of coin flips, as 
observed over 10^ simulations: 



(16) 



Li2(l/2) 


Rama 


arcsin 






arctan [1/2 -I- 1/3; 1] 


C(4) 


C(2) 




1 


TV 


TT 


TT 


TT TT 


77r^ 




24 


TT 


4 


8 


12 


4 8 


720 


12 


7.9 


10.8 


76.5 (co) 


16.2 


4.9 


4.5 26.7 (00) 


6.2 


7.2. 



(The tag "00" means that the expected cost of the simulation is infinite.) 
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